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Abstract 

The paper describes a parser for Catego- 
rial Grammar which provides fully word 
by word incremental interpretation. The 
parser does not require fragments of sen- 
tences to form constituents, and thereby 
avoids problems of spurious ambiguity 
The paper includes a brief discussion of 
the relationship between basic Catego- 
rial Grammar and other formalisms such 
as HPSG, Dependency Grammar and the 
Lambek Calculus. It also includes a discus- 
sion of some of the issues which arise when 
parsing lexicalised grammars, and the pos- 
sibilities for using statistical techniques for 
tuning to particular languages. 

1 Introduction 

There is a large body of psycholinguistic evidence 
which suggests that meaning can be extracted before 
the end of a sentence, and before the end of phrasal 
constituents (e.g. Marslen- Wilson 1973, Tanenhaus 
et al. 1990). There is also recent evidence suggest- 
ing that, during speech processing, partial interpre- 
tations can be built extremely rapidly, even before 
words are completed (Spivey-Knowlton et al. 1994)0. 
There are also potential computational applications 
for incremental interpretation, including early parse 
filtering using statistics based on logical form plau- 
sibility, and interpretation of fragments of dialogues 
(a survey is provided by Milward and Cooper, 1994, 
henceforth referred to as M&C). 

In the current computational and psycholinguis- 
tic literature there are two main approaches to the 

This research was supported by the U.K. Science 
and Engineering Research Council, grant RR30718. I am 
grateful to Patrick Sturt, Carl Vogel, and the reviewers 
for comments on an earlier version. 

Spivey-Knowlton et al. reported 3 experiments. One 
showed effects before the end of a word when there was 
no other appropriate word with the same initial phonol- 
ogy. Another showed on-line effects from adjectives and 
determiners during noun phrase processing. 



incremental construction of logical forms. One ap- 
proach is to use a grammar with 'non-standard' con- 
stituency, so that an initial fragment of a sentence, 
such as John likes, can be treated as a constituent, 
and hence be assigned a type and a semantics. This 
approach is exemplified by Combinatory Categorial 
Grammar, CCG (Steedman 1991), which takes a ba- 
sic CG with just application, and adds various new 
ways of combining elements together^ Incremental 
interpretation can then be achieved using a standard 
bottom-up shift reduce parser, working from left to 
right along the sentence. The alternative approach, 
exemplified by the work of Stabler on top-down pars- 
ing (Stabler 1991), and Pulman on left-corner pars- 
ing (Pulman 1986) is to associate a semantics di- 
rectly with the partial structures formed during a 
top-down or left-corner parse. For example, a syn- 
tax tree missing a noun phrase, such as the following 

s 
/ \ 
np vp 
John / \ 

v np" 
likes 

can be given a semantics as a function from enti- 
ties to truth values i.e. Ax. likes (john,x), without 
having to say that John likes is a constituent. 

Neither approach is without problems. If a gram- 
mar is augmented with operations which are pow- 
erful enough to make most initial fragments con- 
stituents, then there may be unwanted interactions 
with the rest of the grammar (examples of this in 
the case of CCG and the Lambek Calculus are given 
in Section 2). The addition of extra operations 
also means that, for any given reading of a sen- 
tence there will generally be many different possible 
derivations (so-called 'spurious' ambiguity), making 
simple parsing strategies such as shift-reduce highly 



2 Note that CCG doesn't provide a type for all initial 
fragments of sentences. For example, it gives a type to 
John thinks Mary, but not to John thinks each. In con- 
trast the Lambek Calculus (Lambek 1958) provides an 
infinite number of types for any initial sentence fragment. 



inefficient. 

The limitations of the parsing approaches become 
evident when we consider grammars with left recur- 
sion. In such cases a simple top-down parser will 
be incomplete, and a left corner parser will resort to 
buffering the input (so won't be fully word- by- word). 
M&C illustrate the problem by considering the frag- 
ment Mary thinks John. This has a small number of 
possible semantic representations (the exact number 
depending upon the grammar) e.g. 

AP.thinks(mary,P(john)) 
AP.AQ. Q(thinks(mary,P(john))) 
AP.AR. (R( Ax.thinks(x,P(john) ) ) ) (mary) 

The second representation is appropriate if the sen- 
tence finishes with a sentential modifier. The third 
allows there to be a verb phrase modifier. 

If the semantic representation is to be read off syn- 
tactic structure, then the parser must provide a sin- 
gle syntax tree (possibly with empty nodes). How- 
ever, there are actually any number of such syntax 
trees corresponding to, for example, the first seman- 
tic representation, since the np and the s can be 
arbitrarily far apart. The following tree is suitable 
for the sentence Mary thinks John shaves but not for 
e.g. Mary thinks John coming here was a mistake. 

s 
/ \ 
np vp 
Mary / \ 

v s 
thinks / \ 

np vp~ 
John 

M&C suggest various possibilities for packing the 
partial syntax trees, including using Tree Adjoining 
Grammar (Joshi 1987) or Description Theory (Mar- 
cus et al. 1983). One further possibility is to choose 
a single syntax tree, and to use destructive tree op- 
erations later in the parse^J. 

The approach which we will adopt here is based 
on Milward (1992, 1994). Partial syntax trees can 
be regarded as performing two main roles. The first 
is to provide syntactic information which guides how 
the rest of the sentence can be integrated into the 

3 This might turn out to be similar to one view of 
Tree Adjoining Grammar, where adjunction adds into 
a pre-existing well-formed tree structure. It is also 
closer to some methods for incremental adaptation of 
discourse structures, where additions are allowed to the 
right-frontier of a tree structure (e.g. Polanyi and Scha 
1984). There are however problems with this kind of 
approach when features are considered (see e.g. Vijay- 
Shanker 1992). 



tree. The second is to provide a basis for a semantic 
representation. The first role can be captured using 
syntactic types, where each type corresponds to a po- 
tentially infinite number of partial syntax trees. The 
second role can be captured by the parser construct- 
ing semantic representations directly. The general 
processing model therefore consists of transitions of 
the form: 

Syntactic type i Syntactic type i+1 
Semantic rep. Semantic rep i+1 

This provides a state-transition or dynamic model of 
processing, with each state being a pair of a syntactic 
type and a semantic value. 

The main difference between our approach and 
that of Milward (1992, 1994) is that it is based on 
a more expressive grammar formalism, Applicative 
Categorial Grammar, as opposed to Lexicalised De- 
pendency Grammar. Applicative Categorial Gram- 
mars allow categories to have arguments which are 
themselves functions (e.g. very can be treated 
as a function of a function, and given the type 
(n/n)/(n/n) when used as an adjectival modifier). 
The ability to deal with functions of functions has 
advantages in enabling more elegant linguistic de- 
scriptions, and in providing one kind of robust pars- 
ing: the parser never fails until the last word, since 
there could always be a final word which is a func- 
tion over all the constituents formed so far. However, 
there is a corresponding problem of far greater non- 
determinism, with even unambiguous words allowing 
many possible transitions. It therefore becomes cru- 
cial to either perform some kind of ambiguity pack- 
ing, or language tuning. This will be discussed in 
the final section of the paper. 

2 Applicative Categorial Grammar 

Applicative Categorial Grammar is the most basic 
form of Categorial Grammar, with just a single com- 
bination rule corresponding to function application. 
It was first applied to linguistic description by Ad- 
jukiewicz and Bar-Hillel in the 1950s. Although it 
is still used for linguistic description (e.g. Bouma 
and van Noord, 1994), it has been somewhat over- 
shadowed in recent years by HPSG (Pollard and Sag 
1994), and by Lambek Categorial Grammars (Lam- 
bek 1958). It is therefore worth giving some brief 
indications of how it fits in with these developments. 

The first directed Applicative CG was proposed 
by Bar-Hillel (1953). Functional types included a 
list of arguments to the left, and a list of arguments 
to the right. Translating Bar-Hillel's notation into a 
feature based notation similar to that in HPSG (Pol- 
lard and Sag 1994), we obtain the following category 
for a ditransitive verb such as put: 



s 

l(np) 
r{np, pp) 



The list of arguments to the left are gathered under 
the feature, 1, and those to the right, an np and a 
pp in that order, under the feature r. 

Bar-Hillel employed a single application rule, 
which corresponds to the following: 



■ Li 



X 

l<Li 



L n ) 

■ Rn) 



R\ . . . Rn 



X 



The result was a system which comes very close to 
the formalised dependency grammars of Gaifman 
(1965) and Hays (1964). The only real difference 
is that Bar-Hillel allowed arguments to themselves 
be functions. For example, an adverb such as slowly 
could be given the typeQ 



s 

l{np) 



s 

l{np) 
r{> 



An unfortunate aspect of Bar-HillePs first system 
was that the application rule only ever resulted in 
a primitive type. Hence, arguments with functional 
types had to correspond to single lexical items: there 
was no way to form the type np\sQfor a non-lexical 
verb phrase such as likes Mary. 

Rather than adapting the Application Rule to 
allow functions to be applied to one argument at 
a time, Bar-Hillcl's second system (often called 
AB Categorial Grammar, or Adjukiewicz/Bar-Hillel 
CG, Bar-Hillel 1964) adopted a 'Curried' notation, 
and this has been adopted by most CGs since. To 
represent a function which requires an np on the left, 
and an np and a pp to the right, there is a choice 
of the following three types using Curried notation: 

np\((s/pp)/np) 
(np\(s/pp))/np 
((np\s)/pp)/np 

Most CGs either choose the third of these (to give 
a vp structure), or include a rule of Associativity 
which means that the types are interchangeable (in 
the Lambek Calculus, Associativity is a consequence 



4 The reformulation is not entirely faithful here to 
Bar-Hillel, who used a slightly problematic 'double slash' 
notation for functions of functions. 

5 Lambek notation (Lambek 1958). 



of the calculus, rather than being specified sepa- 
rately). 

The main impetus to change Applicative CG came 
from the work of Ades and Steedman (1982). Ades 
and Steedman noted that the use of function com- 
position allows CGs to deal with unbounded depen- 
dency constructions. Function composition enables 
a function to be applied to its argument, even if that 
argument is incomplete e.g. 

s/pp + pp/np -> s/np 

This allows peripheral extraction, where the 'gap' 
is at the start or the end of e.g. a relative clause. 
Variants of the composition rule were proposed in 
order to deal with non-peripheral extraction, but 
this led to unwanted effects elsewhere in the gram- 
mar (Bouma 1987). Subsequent treatments of non- 
peripheral extraction based on the Lambek Calculus 
(where standard composition is built in: it is a rule 
which can be proven from the calculus) have either 
introduced an alternative to the forward and back- 
ward slashes i.e. / and \ for normal args, | for wh- 
args (Moortgat 1988), or have introduced so called 
modal operators on the wh-argument (Morrill et al. 
1990). Both techniques can be thought of as mark- 
ing the wh- arguments as requiring special treatment, 
and therefore do not lead to unwanted effects else- 
where in the grammar. 

However, there are problems with having just 
composition, the most basic of the non-applicative 
operations. In CGs which contain functions of func- 
tions (such as very, or slowly), the addition of com- 
position adds both new analyses of sentences, and 
new strings to the language. This is due to the fact 
that composition can be used to form a function, 
which can then be used as an argument to a function 
of a function. For example, if the two types, n/n and 
n/n are composed to give the type n/n, then this 
can be modified by an adjectival modifier of type 
(n/n)/(n/n). Thus, the noun very old dilapidated 
car can get the unacceptable bracketing, [[very [old 
dilapidated]] car]. Associative CGs with Composi- 
tion, or the Lambek Calculus also allow strings such 
as boy with the to be given the type n/n predicting 
very boy with the car to be an acceptable noun. Al- 
though individual examples might be possible to rule 
out using appropriate features, it is difficult to see 
how to do this in general whilst retaining a calculus 
suitable for incremental interpretation. 

If wh-arguments need to be treated specially any- 
way (to deal with non-peripheral extraction) , and if 
composition as a general rule is problematic, this 
suggests we should perhaps return to grammars 
which use just Application as a general operation, 



but have a special treatment for wh-arguments. Us- 
ing the non-Curried notation of Bar-Hillel, it is more 
natural to use a separate wh-list than to mark wh- 
arguments individually. For example, the category 
appropriate for relative clauses with a noun phrase 
gap would be: 



s 

10 
r<> 

w{np) 



It is then possible to specify operations which act 
as purely applicative operations with respect to the 
left and right arguments lists, but more like compo- 
sition with respect to the wh-list. This is very simi- 
lar to the way in which wh-movement is dealt with 
in GPSG (Gazdar et al. 1985) and HPSG, where 
wh-arguments are treated using slash mechanisms 
or feature inheritance principles which correspond 
closely to function composition. 

Given that our arguments have produced a cate- 
gorial grammar which looks very similar to HPSG, 
why not use HPSG rather than Applicative CG? The 
main reason is that Applicative CG is a much sim- 
pler formalism, which can be given a very simple 
syntax semantics interface, with function applica- 
tion in syntax mapping to function application in 
semantics[|Q This in turn makes it relatively easy 
to provide proofs of soundness and completeness for 
an incremental parsing algorithm. Ultimately, some 
of the techniques developed here should be able to 
be extended to more complex formalisms such as 
HPSG. 



6 One area where application based approaches to se- 
mantic combination gain in simplicity over unification 
based approaches is in providing semantics for func- 
tions of functions. Moore (1989) provides a treatment of 
functions of functions in a unification based approach, 
but only by explicitly incorporating lambda expressions. 
Pollard and Sag (1994) deal with some functions of func- 
tions, such as non-intersective adjectives, by explicit set 
construction. 

7 As discussed above, wh-movement requires some- 
thing more like composition than application. A sim- 
ple syntax semantics interface can be retained if the 
same operation is used in both syntax and semantics. 
Wh-arguments can be treated as similar to other argu- 
ments i.e. as lambda abstracted in the semantics. For 
example, the fragment: John found a woman who Mary 
can be given the semantics AP.3x. woman(x) & 
found(john,x) & P(mary,x), where P is a function 
from a left argument Mary of type e and a wh-argument, 
also of type e. 



3 AB Categorial grammar with 
Associativity (AACG) 

In this section we dehnc a grammar similar to Bar- 
HillePs first grammar. However, unlike Bar-Hillel, 
we allow one argument to be absorbed at a time. 
The resulting grammar is equivalent to AB Catego- 
rial Grammar plus associativity. 

The categories of the grammar are defined as fol- 
lows: 



1. If X is a syntactic type (e.g. s, np), then 



is a category. 



X 

1<> 
r{> 



If X is a syntactic type, and L and R are lists 
of categories, then 



X 

1L 

rR 



is a category. 



Application to the right is defined by the rule^]: 



X 




X 


1L 


+ Ri =► 


lL 


r(R 1 )»R 




rR 



Application to the left is defined by the rule: 





X 




X 


Li + 


\{Lt)*L 


=> 


lL 




rR 




rR 



The basic grammar provides some spurious deriva- 
tions, since sentences such as John likes Mary can 
be bracketed as either ((John likes) Mary) or (John 
(likes Mary)). However, we will see that these spu- 
rious derivations do not translate into spurious am- 
biguity in the parser, which maps from strings of 
words directly to semantic representations. 

4 An Incremental Parser 

Most parsers which work left to right along an input 
string can be described in terms of state transitions 
i.e. by rules which say how the current parsing state 
(e.g. a stack of categories, or a chart) can be trans- 
formed by the next word into a new state. Here this 
will be made particularly explicit, with the parser 
described in terms of just two rules which take a 
state, a new word and create a new state^j. There 

8 V is list concatenation e.g. (np) • (s) equals (np,s). 

9 This approach is described in greater detail in Mil- 
ward (1994), where parsers are specified formally in 
terms of their dynamics. 



State- Application: 

' Y 

1<> 

X 
lLo 
rRo 
hH 



>.i?2 



"W" 



Y 

l<> 

xRi»R 2 

h<) 

An. F(G(n)) 



where W: 



X 
1L 



G 



State-Prediction: 



Y 
r( 



rRo 



) -i?2 



Y 

M) 



rR 1 . ( 



X 



1< 



ri?o 



1L 
rR 

h<> 

Z 

li 

rR 

h{> 



where W: 



Z 

lLi»L 

rR^R 



G 



An. (Ah. F(Ali. (h( Ar (((G n)r)li))))) 

Figure 1: Transition Rules 



are two unusual features. Firstly, there is nothing 
equivalent to a stack mechanism: at all times the 
state is characterised by a single syntactic type, and 
a single semantic value, not by some stack of se- 
mantic values or syntax trees which are waiting to 
be connected together. Secondly, all transitions be- 
tween states occur on the input of a new word: there 
are no 'empty' transitions (such as the reduce step 
of a shift-reduce parser). 

The two rules, which are given in Figure 1^, are 
difficult to understand in their most general form. 
Here we will work upto the rules gradually, by con- 
sidering which kinds of rules we might need in par- 
ticular instances. Consider the following pairing of 
sentence fragments with their simplest possible CG 
type: 

Mary thinks: s/s 
Mary thinks John: s/(np\s) 
Mary thinks John likes: s/np 
Mary thinks John likes Sue: s 



Li, Ri, Hi are lists of categories. 1, and n are lists 
of variables, of the same length as the corresponding Li 
and Rj. 



Now consider taking each type as a description of 
the state that the parser is in after absorbing the 
fragment. We obtain a sequence of transitions as 
follows: 



, "John" ., , , 
s/s -> s/(np\s) 



'likes" , "Sue" 
— > s/np — > s 



If an embedded sentence such as John likes Sue is a 
mapping from an s/s to an s, this suggests that it 
might be possible to treat all sentences as mapping 
from some category expecting an s to that category 
i.e. from X/s to X. Similarly, all noun phrases might 
be treated as mappings from an X/np to an X. 

Now consider individual transitions. The simplest 
of these is where the type of argument expected by 
the state is matched by the next word i.e. 



, "Sue" 
s/np — > s 



where: Sue: 



up 



This can be generalised to the following rule, which 
is similar to Function Application in standard CG^] 

11 It differs in not being a rule of grammar: here the 
functor is a state category and the argument is a lexical 
category. In standard CG function application, the func- 
tor and argument can correspond to a word or a phrase. 



X/Y "^T X where: W: Y 

A similar transition occurs for likes. Here an np\s 
was expected, but likes only provides part of this: it 
requires an np to the right to form an np\s. Thus 
after likes is absorbed the state category will need 
to expect an np. The rule required is similar to 
Function Composition in CG i.e. 

"W" 

X/Y ™ X/Z where: W: Y/Z 

Considering this informally in terms of tree struc- 
tures, what is happening is the replacement of an 
empty node in a partial tree by a second partial tree 
i.e. 



X X 
/ \ / \ 

U Y" + Y => U Y 

/ \ / \ 

V Z" V Z~ 



The two rules specified so far need to be further gen- 
eralised to allow for the case where a lexical item has 
more than one argument (e.g. if we replace likes by 
a di-transitive such as gives or a tri-transitive such 
as bets) . This is relatively trivial using a non-curried 
notation similar to that used for AACG. What we 
obtain is the single rule of State- Application, which 
corresponds to application when the list of argu- 
ments, Ri, is empty, to function composition when 
Ri is of length one, and to n-ary composition when 
Ri is of length n. The only change needed from 
AACG notation is the inclusion of an extra feature 
list, the h list, which stores information about which 
arguments are waiting for a head (the reasons for 
this will be explained later). The lexicon is identi- 
cal to that for a standard AACG, except for having 
h-lists which are always set to empty. 

Now consider the first transition. Here a sentence 
was expected, but what was encountered was a noun 
phrase, John. The appropriate rule in CG notation 
would be: 

"W" 

X/Y ™ X/(Z\Y) where: W: Z 

This rule states that if looking for a Y and get a 
Z then look for a Y which is missing a Z. In tree 
structure terms we have: 

X X 
/ \ / \ 

U Y" + Z => U Y 

/ \ 
Z Z\Y" 



The rule of State-Prediction is obtained by further 
generalising to allow the lexical item to have missing 
arguments, and for the expected argument to have 
missing arguments. 

State-Application and State-Prediction together 
provide the basis of a sound and complete parser^. 
Parsing of sentences is achieved by starting in a state 
expecting a sentence, and applying the rules non- 
deterministically as each word is input. A successful 
parse is achieved if the final state expects no more 
arguments. As an example, reconsider the string 
John likes Sue. The sequence of transitions cor- 
responding to John likes Sue being a sentence, is 
given in Figure 2. The transition on encountering 
John is deterministic: State- Application cannot ap- 
ply, and State-Prediction can only be instantiated 
one way. The result is a new state expecting an ar- 
gument which, given an np could give an s i.e. an 
np\s. 

The transition on input of likes is non- 
deterministic. State-Application can apply as in 
Figure 2. However, State-Prediction can also ap- 
ply, and can be instantiated in four ways (these cor- 
respond to different ways of cutting up the left and 
right subcategorisation lists of the lexical entry, likes, 
i.e. as (np) • (} or () • (np)). One possibility corre- 
sponds to the prediction of an s\s modifier, a sec- 
ond to the prediction of an (np\s)\(np\s) modifier 
(i.e. a verb phrase modifier), a third to there being 
a function which takes the subject and the verb as 
separate arguments, and the fourth corresponds to 
there being a function which requires an s/np ar- 
gument. The second of these is perhaps the most 
interesting, and is given in Figure 3. It is the choice 
of this particular transition at this point which al- 
lows verb phrase modification, and hence, assuming 
the next word is Sue, an implicit bracketing of the 
string fragment as (John (likes Sue)). Note that if 
State- Application is chosen, or the first of the State- 
Prediction possibilities, the fragment John likes Sue 
retains a flat structure. If there is to be no modifica- 
tion of the verb phrase, no verb phrase structure is 
introduced. This relates to there being no spurious 
ambiguity: each choice of transition has semantic 
consequences; each choice affects whether a particu- 
lar part of the semantics is to be modified or not. 

Finally it is worth noting why it is necessary to 
use h-lists. These are needed to distinguish be- 

12 The parser accepts the same strings as the gram- 
mar and assigns them the same semantic values. This is 
slightly different from the standard notion of soundness 
and completeness of a parser, where the parser accepts 
the same strings as the grammar and assigns them the 
same syntax trees. 



s 

1(> 

r(s) 
h<) 
AQ.Q 



s 

1<> 



s 

r<> 
_h<> 

AH. (H(john')) 



"John" 



"likes" 



S 

1<> 



s 

l<np> 
r(> 

h(np) 
AH. (H(john')) 



"likes" 



s 

1<> 

r(np) 
h<) 

AY.likes'(john',Y) 



"Sue" 



s 

1<> 
r<> 

likes' (john',sue') 



Figure 2: Possible state transitions 



5 

1(> 



r(np, 



s 

r(> 
h<> 

s 

l(np) 
r<> 



,np) 



where W: 



s 

l(np) 
r{np) 
h<> 

AY.AX.likcs'(X,Y) 



AY.AK.(K(AX.likes'(X,Y)))(john) 
Figure 3: Example instantiation of State-Prediction 



tween cases of real functional arguments (of func- 
tions of functions), and functions formed by State- 
Prediction. Consider the following trees, where the 
np\s node is empty. 

s s 
/ \ / \ 

s/s s np np\s 

/ \ / \ 

np np\s~ (np\s)/(np\s) np\s" 

Both trees have the same syntactic type, however in 
the first case we want to allow for there to be an 
s\s modifier of the lower s, but not in the second. 
The headed list distinguishes between the two cases, 
with only the first having an np on its headed list, 
allowing prediction of an s modifier. 

5 Parsing Lexicalised Grammars 

When we consider full sentence processing, as op- 
posed to incremental processing, the use of lexi- 
calised grammars has a major advantage over the 



use of more standard rule based grammars. In pro- 
cessing a sentence using a lexicalised formalism we 
do not have to look at the grammar as a whole, but 
only at the grammatical information indexed by each 
of the words. Thus increases in the size of a gram- 
mar don't necessarily effect efficiency of processing, 
provided the increase in size is due to the addition 
of new words, rather than increased lexical ambigu- 
ity. Once the full set of possible lexical entries for a 
sentence is collected, they can, if required, then be 
converted back into a set of phrase structure rules 
(which should correspond to a small subset of the 
rule based formalism equivalent to the whole lexi- 
calised grammar), before being parsing with a stan- 
dard algorithm such as Earley's (Earley 1970). 

In incremental parsing we cannot predict which 
words will appear in the sentence, so cannot use the 
same technique. However, if we are to base a parser 
on the rules given above, it would seem that we gain 
further. Instead of grammatical information being 
localised to the sentence as a whole, it is localised to 



a particular word in its particular context: there is 
no need to consider a pp as a start of a sentence if 
it occurs at the end, even if there is a verb with an 
entry which allows for a subject pp. 

However there is a major problem. As we noted 
in the last paragraph, it is the nature of parsing in- 
crementally that we don't know what words are to 
come next. But here the parser doesn't even use the 
information that the words are to come from a lexi- 
con for a particular language. For example, given an 
input of 3 nps, the parser will happily create a state 
expecting 3 nps to the left. This might be a likely 
state for say a head final language, but an unlikely 
state for a language such as English. Note that in- 
cremental interpretation will be of no use here, since 
the semantic representation should be no more or 
less plausible in the different languages. In practical 
terms, a naive interactive parallel Prolog implemen- 
tation on a current workstation fails to be interactive 
in a real sense after about 8 words^J. 

What seems to be needed is some kind of language 
tuning} This could be in the nature of fixed restric- 
tions to the rules e.g. for English we might rule out 
uses of prediction when a noun phrase is encoun- 
tered, and two already exist on the left list. A more 
appealing alternative is to base the tuning on sta- 
tistical methods. This could be achieved by running 
the parser over corpora to provide probabilities of 
particular transitions given particular words. These 
transitions would capture the likelihood of a word 
having a particular part of speech, and the proba- 
bility of a particular transition being performed with 
that part of speech. 

There has already been some early work done on 
providing statistically based parsing using transi- 
tions between recursively structured syntactic cat- 
egories (Tugwell 1995)^|. Unlike a simple Markov 

13 This result should however be treated with some 
caution: in this implementation there was no attempt 
to perform any packing of different possible transitions, 
and the algorithm has exponential complexity. In con- 
trast, a packed recogniser based on a similar, but much 
simpler, incremental parser for Lexicalised Dependency 
Grammar has 0(n 3 ) time complexity (Milward 1994) 
and good practical performance, taking a couple of sec- 
onds on 30 word sentences. 

14 The usage of the term language tuning is perhaps 
broader here than its use in the psycholinguistic litera- 
ture to refer to different structural preferences between 
languages e.g. for high versus low attachment (Mitchell 
et al. 1992). 

15 Tugwell's approach does however differ in that the 
state transitions are not limited by the rules of State- 
Prediction and State- Application. This has advantages 
in allowing the grammar to learn phenomena such as 
heavy NP shift, but has the disadvantage of suffering 



process, there are a potentially infinite number of 
states, so there is inevitably a problem of sparse 
data. It is therefore necessary to make various gen- 
eralisations over the states, for example by ignoring 
the R.2 lists. 

The full processing model can then be either se- 
rial, exploring the most highly ranked transitions 
first (but allowing backtracking if the semantic plau- 
sibility of the current interpretation drops too low), 
or ranked parallel, exploring just the n paths ranked 
highest according to the transition probabilities and 
semantic plausibility. 

6 Conclusion 

The paper has presented a method for providing 
interpretations word by word for basic Categorial 
Grammar. The final section contrasted parsing with 
lexicalised and rule based grammars, and argued 
that statistical language tuning is particularly suit- 
able for incremental, lexicalised parsing strategies. 
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